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. . . We consider the problem of high-dimensional Ising (graphical) 

Cn , model selection. We propose a simple algorithm for structure esti- 

mation based on the thresholding of the empirical conditional vari- 
ation distances. We introduce a novel criterion for tractable graph 
families, where this method is efficient, based on the presence of 
sparse local separators between node pairs in the underlying graph. 
For such graphs, the proposed algorithm has a sample complexity of 
n = fl(J^i^^logp), where p is the number of variables and Jmin is 
the minimum (absolute) edge potential in the model. We also estab- 
lish non-asymptotic necessary and sufficient conditions for structure 
estimation. 

h-i: 
^. 

1. Introduction. The use of probabilistic graphical models allows for succinct representation 

j5 ■ of high-dimensional distributions, where the conditional-independence relationships among the vari- 

j/3 ! ables are represented by a graph. Such models have found many applications in a variety of areas 

including computer vision [15], bio-informatics [24], financial modeling [16] and social networks [29]. 

ff^ ■ For instance, graphical models are employed for contextual object recognition to improve detection 

^ . performance based on object co-occurrences [15] and for modeling opinion formation and technology 

adoption in social networks [29, 35]. 
t"~- ■ A major challenge involving graphical models is structure estimation given samples drawn from 

the model. It is known that such a learning task is NP-hard [8, 32]. This challenge is compounded in 
C^ I the high- dimensional regime, where the number of available observations is typically much smaller 

than the number of dimensions (or variables). It is thus imperative to design efficient algorithms 
for structure estimation of graphical models with low sample complexity. 

In their seminal work. Chow and Liu presented an efficient algorithm for structure estimation of 
tree-structured graphical models based on a maximum weight spanning tree algorithm [17]. Since 
then, various algorithms have been proposed for structure estimation of sparse graphical models. 
C^ ! They can be broadly classified into two categories: local algorithms [11, 46] and those based on 

convex relaxation [12, 42, 48, 49]. The former approach is typically based on local search while 
the latter approach involves solving a penalized convex optimization problem. See Section 1.2 for 
a detailed discussion of these approaches. 

In this paper, we propose a novel local algorithm and analyze its performance for structure 
estimation of Ising models, which are pairwise binary graphical models. Our proposed algorithm 
circumvents one of the primary limitations of existing local algorithms [11, 46] for consistent es- 
timation in high-dimensions - that the graphs have a bounded degree as the number of nodes p 
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tends to infinity. We give a precise characterization of the class of graphs which can be consistently 
recovered by our algorithm with low computational and sample complexities. We demonstrate that 
a fundamental property shared by these graphs is that they have sparse local vertex separators 
between any two non-neighbors in the graph. A wide variety of graphs satisfy this property. These 
include large girth graphs, the Erdos-Renyi random graphs^ [9] and the power-law graphs [20], as 
well as graphs with short cycles such as the small- world graphs [58] and other hybrid graphs [20, 
Ch. 12]. 

Our results are applicable in the realms of social networks, bio- informatics, computer vision and 
so on. Here, we elaborate on its relevance to social networks. The aforementioned graphs (i.e., the 
power-law and the small-world graphs) have been employed extensively for modeling the topologies 
of social networks [2, 47]. More recently, Ising models on such topologies have been employed 
for modeling various phenomena in social networks [55] such as opinion formation [26, 29, 38] 
and technology adoption [35]. A concrete example is the use of an Ising model for U.S. senate 
voting network [61]. The nodes of the graph represent the senators and the data are the voting 
decisions made by the senators. Estimating the graph reveals interesting relationships between the 
senators and the effect of political affiliations on their decisions. Similarly, in many other scenarios 
(e.g. online social networks), we have access to a sequence of measurements at the nodes of the 
network. For instance, we may gather the opinions of different users or measure the popularity of 
new technologies. As a first-order approximation, we can regard such a sequence of measurements 
as being independent and identically distributed (i.i.d.) samples drawn from an Ising model. Our 
findings imply that the topology of such social-network models can be efficiently estimated under 
some mild and transparent conditions. 

1.1. Summary of Results. Our main contributions in this work are threefold. We propose a 
simple local algorithm for structure estimation of Ising models. The algorithm is based on a set 
of conditional variation distance threshold tests. Second, we derive sample complexity results for 
consistent structure estimation in high dimensions. Third, we prove novel lower bounds on the 
sample complexity required for any learning algorithm to be consistent for model selection. 

We propose an algorithm for structure estimation, termed as conditional variation distance 
thresholding (CVDT), which first computes the minimum empirical conditional variation distance 
in (14) of a given node pair over conditioning sets of bounded cardinality rj. Second, if the minimum 
exceeds a given threshold (depending on the number of samples n and the number of nodes p) , the 
node pair is declared as an edge. This test has a computational complexity of 0(p^+^). Thus, the 
computational complexity is low if rj is small. Further, it requires only low-order statistics (up to 
order r] + 2). We establish that the parameter ry is a bound on the size of local vertex-separators 
between any two non-neighbors in the graph, and is small for many common graph families, as 
discussed previously. 

We establish that under a set of mild and transparent assumptions, structure learning is consis- 
tent in high-dimensions for CVDT when the number of samples scales as n = 0(J~jjjlogp), for a 
p-node graph, where Jmin is the minimum (absolute) edge-potential of the Ising model. We relate 
the conditions for successful graph recovery to certain phase transitions in the Ising model. We also 
derive (non-asymptotic) PAC guarantees for CVDT and provide explicit results for specific graph 
families. 



^The Erdos-Renyi graphs have sparse local vertex separators asymptotically almost surely (a.a.s.) with respect to 
the random graph measure. Indeed, whenever we mention ensembles of random graphs in the sequel, our statements 
are taken to hold a.a.s. 



We also derive a lower bound (necessary condition) on the sample complexity required for con- 
sistent structure learning with positive probability by any algorithm. We prove that n = i}{clogp) 
number of samples is required by any algorithm to ensure consistent learning of Erdos-Renyi random 
graphs, where c is the average degree and p is the number of nodes. We also present a non-asymptotic 
necessary condition which employs information-theoretic techniques such as Fano's inequality and 
typicality. We also provide results for other graph families such as the girth-bounded graphs and 
augmented graphs. 

Our results have several ramifications: we characterize the tradeoff between various graph pa- 
rameters such as the maximum degree, threshold for local path length and the strength of edge 
potentials for efficient and consistent structure estimation. For instance, we establish a natural rela- 
tionship between maximum degree and girth of a graph for consistent estimation: graphs with large 
degrees can be consistently estimated by our algorithm when they also have large girths. Indeed, 
in the extreme case of trees which have infinite girth, they can be consistently estimated with no 
constraint on the node degrees, corroborating the initial observation by Chow and Liu [17]. We 
also derive stronger guarantees for many random-graph families. For instance, for the Erdos-Renyi 
random graph family and the small-world family (which is the union of a d-dimensional grid and 
an Erdos-Renyi random graph), the minimum sample complexity scales as n = $7(c^ logp), where c 
is the average degree of the Erdos-Renyi random graph. Thus, when the average degree is bounded 
(c = 0(1)), the sample complexity of our algorithm scales as n = ri(logp). Recall that the sample 
complexity of learning tree models is ri(logp) [53]. Thus, we establish that the complexity of learn- 
ing sparse random graphs using the proposed algorithm is akin to learning tree models in certain 
parameter regimes. 

Our sufficient conditions for consistent structure estimation impose transparent constraints on 
the graph structure and the parameters. The structural property is related to the presence of 
sparse local vertex separators between non-adjacent node pairs in the graph. The conditions on 
the parameters require that the edge potentials of the Ising model be below a certain threshold, 
which we explicitly characterize. In fact, we establish that below this threshold, the effect of long- 
range paths in the model decays and that graph estimation is feasible via local conditioning, as 
prescribed by our algorithm. Similar notions have been previously established in other contexts, 
e.g., to establish polynomial mixing time for Gibbs sampling of the Ising model [37]. We compare 
these different criteria and show that we can guarantee consistent learning in high dimensions 
under weaker conditions than those required for polynomial mixing of Gibbs sampling. Ours is the 
first work (to the best of the authors' knowledge) to establish such explicit connections between 
structure estimation and the statistical physics properties (i.e., phase transitions) of Ising models. 
Establishing these results requires the development and use of tools (e.g., self-avoiding walk trees) 
not previously employed for learning problems. 

1.2. Related Work. The problem of structure estimation of a general graphical model [8, 32] is 
NP-hard. However, for tree-structured graphical models, the maximum-likelihood (ML) estimation 
can be implemented efficiently via the Chow-Liu algorithm [17] since ML estimation reduces to a 
maximum- weight spanning tree problem where the edge weights are the empirical mutual informa- 
tion quantities, computed from samples. It can be established that the sample complexity for the 
Chow-Liu algorithm scales as n = O(logp), where p is the number of variables [53]. Error-exponent 
analysis of the Chow-Liu algorithm was performed in [52, 54] and extensions to general acyclic 
models [39, 53] and trees with latent (or hidden) variables [16] have also been studied recently. 

Given the feasibility of structure learning of tree models, a natural extension is to consider learn- 



ing the structures oi junction trees?' Efficient algorithms have been previously proposed for learning 
junction trees with bounded treewidth (e.g., [13]). However, the complexity of these algorithms is 
exponential in the tree width, and hence, are not practical when the graphs have unbounded 
treewidth.'^ 

There are mainly two classes of algorithms for graphical model selection: local-search based 
approaches [11, 46] and those based on convex optimization [12, 42, 48, 49]. The latter approach 
typically incorporates an (.i penalty term to encourage sparsity in the graph structure. In [48], 
structure estimation of Ising models is considered where neighborhood selection for each node is 
performed based on ^i-penalized logistic regression. It was shown that this algorithm has a sample 
complexity of n = Q.{A^\ogp) under a set of so-called "incoherence" conditions. However, the 
incoherence conditions are not easy to interpret and NP-hard to verify in general models [6]. For 
more detailed comparison, see Section 3.5. 

In contrast to convex-relaxation approaches, the local-search based approach rely on a series of 
simple local tests for neighborhood selection at individual nodes. For instance, the work in [11] 
performs neighborhood selection at each node based on a series of conditional-independence tests. 
Abbeel et. al. [1] propose an algorithm, similar in spirit to learning factor graphs with bounded 
degree. The works in [51] and [14] consider conditional- independence tests for learning Bayesian 
networks. In [46], the authors suggested an alternative greedy algorithm, based on minimizing 
conditional entropy, for graphs with large girth and bounded degree. However, these works [1, 
11, 14, 46, 51] require the maximum degree in the graph to be bounded (A = 0(1)) which may 
be restrictive in practical scenarios. We consider graphical model selection on graphs where the 
maximum degree is allowed to grow with the number of nodes (albeit at a controlled rate). Moreover, 
we establish a natural tradeoff between the maximum degree and other parameters of the graph 
(e.g., girth) required for consistent structure estimation. 

Necessary conditions on structure learning provide lower bounds on the sample complexity for 
structure learning and have been studied in [44, 50, 57]. However, a standard assumption that these 
works make is that the underlying set of graphs is uniformly distributed with bounded degree. 
For this scenario, it is shown that n = r2(A logp) samples are required for consistent structure 
estimation, for a graph with p nodes and maximum degree A, for some /c € N, say /c = 3 or 4. 
In contrast, our converse result is stated in terms of the average degree instead of the maximum 
degree. 

2. System Model. In this section, we define the relevant notation to be used in the rest of 
the paper. 

2.1. Notation. We introduce some basic notions. Let ||-||-^ denote the ^i norm. For any two 
discrete distributions P, Q on the same alphabet X , the total variation distance is given by 



(1) v{P,Q) := hp-Q\\, = IY1 \Pi^)-Qi^)\ 



2 



^Junction trees are formed by triangulating a given graph, and its nodes correspond to the maximal cliques of the 
triangulated graph [56] . The treewidth of a graph is one less than the minimum possible size of the maximum clique 
in the triangulated graph over all possible triangulations. 

^For instance, it is known that for a Erdos-Renyi random graph Gp ~ S{p,c/p) when (c > 1), the tree- width is 
greater than p' , for some e > [34] . 



and the Kullback-Leibler distance (or relative entropy) is given by 

P{x) 



I)(P||Q):=^P(x)log 



Given a pair of discrete random variables {X, Y) taking values on the set X x y and distributed 
as P = Px,Y, the mutual information is defined as 

(2) I{X-Y):=D{P{x,y)\\P{x)P{y))= ^ P(a:,y)log^^^ 

On similar lines, the conditional mutual information of X and Y given another random variable Z, 
taking values on a countable set Z, is defined as 

(3) I{X;Y\Z):= ^ P(x,j,,z)log f[^'g';^ 

.ex,t^y,.ez P{x\z)Piy\z) 

It is also well-known that I{X;Y\Z) = if and only if X and Y are independent given Z, i.e., 
P{x,y\z) = P(x\z)P{y\z). 

Given n samples drawn i.i.d. from P{x,y), denoted by (x",?/") = {(xj,yj)}"^^, the (joint) em- 
pirical distribution or the (joint) type is defined as 



1 " 
(4) P"(x,y;x",y") := - j;i{(x,y) = (x„y,)} 



n 

i=l 



We loosely use the term empirical distance to refer to distances between empirical distributions. 
For instance, the empirical variation distance is given by 



(5) z.(P",Q"):=^5^|p"(x)-g"C 



2 

Our algorithm for graph estimation will be based on empirical variation distance between con- 
ditional distributions. We employ such empirical estimates for testing conditional independencies 
between specific distributions. 

2.2. Ising Models. A graphical model is a family of multivariate distributions which are Markov 
in accordance to a particular undirected graph [36]. Each node in the graph i ^ V \s associated 
to a random variable Xj taking value in a set X. The set of edges^ E C (2) captures the set 
of conditional-independence relationships among the random variables. We say that a vector of 
random variables X := (Xi, . . . , Xp) with a joint probability mass function (pmf) P is Markov on 
the graph G if the local Markov property 

(6) P{xi\xj^(i)) = P{xi\xv\i) 

holds for all nodes i (zV. More generally, we say that P satisfies the global Markov property, if for 
all disjoint sets A, B C V such that A n M{B) = M{A) n P = 0, we have 

(7) P(xA,Xij|x5(A,B;G)) = -P(xa|x5(a,B;G))-P(xb|x5(a,B;G))- 



We use notations E and G interchangeably to denote the set of edges. 

5 



where the set S{A, B; G) is a node separator^ between A and B, and A^(^) denotes the neighborhood 
of A in G. The local and global Markov properties are equivalent under the positivity condition, 
given by P(x) > 0, for ah x G Af^ [36]. 

The Hammersley-Clifford theorem [10] states that under the positivity condition, a distribution 
P satisfies the Markov property according to a graph G iff. it factorizes according to the cliques of 
G, i.e.. 



(8) P(x) = iexp[^M/e(x,)J, 
\cec / 



where C is the set of cliques of G and Xc is the set of random variables on clique c. The quantity Z is 
known as the partition function and serves to normalize the probability distribution. The functions 
^c are known as potential functions. An important class of graphical models is the class of pairwise 
models, which factorize according to the edges of the graph. 



(9) P(x) = iexp(5]*e(Xe)) 



One of the most well-studied pairwise models is the Ising model. Here, each random variable Xi 
takes values in the set X = {—1,+!} and the probability mass function (pmf) is given by 



(10) P(x) = |exp 



-x"^ Jgx + h^x 
2 



XG{-1,1F 



where Jq is known as the potential matrix and h as the potential vector. By convention, J{i, i) = 
for all i £ V. The sparsity pattern of Jg corresponds to that of the graph G, i.e., Jij = for 
{i,j) ^ G. A model is said to be attractive or ferromagnetic if Jij > and hi > 0, for all i,j € V. 
An Ising model is said to be symmetric if h = 0. 

We assume that there exists Jmin! Jma.x £ ^ such that the absolute values of the edge potentials 
are uniformly bounded, i.e., 

(11) \Jij\ £ [Jmin,^max], V(i,i) € G. 

We can provide guarantees on structure recovery, subject to conditions on Jmin and Jmax- We 
assume that the node potentials hi are uniformly bounded away from ztco. 

Given an Ising model, nodes i,j£V and a subset S C V\{i,j}, we define conditional variation 
distance as 

(12) i^iy.s ■■= min u{P{Xi\Xj = +,Xs = ^s),P{X,\Xj = -,X5 = x^)), 

xsg{±i}I*"I 

(13) = min ^ V \PiXi=Xi\X,=+,Xs=Ks)-PiX, = x^\X, = -,Xs = xs)\- 

xse{±i}isi 2 ^_^^ 

The empirical conditional variation distance I'iij-s is defined by replacing the actual distributions 
with their empirical versions 

(14) Dr,r,s-= min z.(P"(X,|X, = +,Xs = X5),P"(X,|X, = -,X5 = xs)). 

xsG{±l}IS| 



^A set iS(^, B;G)GVisa. separator of sets A and B if the removal of nodes in S{A, B; G) separates A and B 
into distinct components. 

6 



Our algorithm will be based on empirical conditional variation distances. This is because the con- 
ditional variation distances^ can be used as a test for conditional independence 

(15) {Xi±Xj\Xs} = {iy^r,s = 0}, yi,jeV,ScV\{i,j}. 

2.3. Tractable Graph Families. We consider the class of Ising models Markov on a graph Gp 
belonging to some ensemble 9{p) of graphs with p nodes. We consider the high-dimensional regime, 
where both p and the number of samples n grow simultaneously; typically, the growth of p is 
much faster than that of n. We emphasize that in our formulation the graph ensemble 9{p) can 
either be deterministic or random ~ in the latter, we also specify a probability measure over the 
set of graphs in 9{p). In the setting where S(p) is a random-graph ensemble, let P:x.^g denote the 
joint probability distribution of the variables X and the graph G ^ 9{p), and let -Px|G denote the 
conditional distribution of the variables given a graph G. Let Pq denote the probability distribution 
of graph G drawn from a random ensemble 9(p)- In this setting, we use the term almost every (a.e.) 
graph G satisfies a certain property Q if 

lim Pg[G satisfies Q] = 1. 

In other words, the property Q holds asymptotically almost surely^ (a.a.s.) with respect to the 
random-graph ensemble 9{p)- Our conditions and theoretical guarantees will be based on this notion 
for random graph ensembles. Intuitively, this means that graphs that have a vanishing probability 
of occurrence as p — )■ oo are ignored. 

We now characterize the ensemble of graphs amenable for consistent structure estimation under 
our formulation. To this end, we characterize the so-called local separators in graphs. See Fig. 1 
for an illustration. For 7 G N, let B^{i; G) denote the set of vertices within distance 7 from i with 
respect to graph G. Let F^^i := G{B^{i)) denote the subgraph of G spanned by B^{i;G), but in 
addition, we retain the nodes not in B^{i) (and remove the corresponding edges). 

Definition 1 (7-Local Separator). Given a graph G, a 7-local separator S^{i,j) between i and 
j, for {i,j) ^ G, is a minimal vertex separator^ with respect to the subgraph F^^i. In addition, the 
parameter 7 is referred to as the path threshold for local separation. 

In other words, the 7-local separator S.y{i,j) separates nodes i and j with respect to paths in 
G of length at most 7. We now characterize the ensemble of graphs based on the size of local 
separators. 

Definition 2 ((ry,7)-Local Separation Property). An ensemble of graphs S(p;??,7) satisfies 
{rj,j)-local separation property if for a.e. Gp £ S(p;^, 7); 

(16) max \S^{i,j)\ <r]. 

In Section 3, we propose an efficient algorithm for graphical model selection when the underly- 
ing graph belongs to a graph ensemble S(p;??,7) with sparse local separators (i.e., small r], for r/ 
defined in (16)). We will see that the computational complexity of our proposed algorithm scales as 



Note that the conditional variation distances are in general asymmetric, i.e., Vi\j-s 7^ ^j\i;S- 

''Note that the term a.a.s. does not apply to deterministic graph ensembles 9(p) where no randomness is assumed, 
and in this setting, we assume that the property Q holds for every graph in the ensemble. 
**A minimal separator is a separator of smallest cardinality. 
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Fig 1. Illustration ofl-local separator set S{i,j; G, I) for the graph shown above with I = 4. Note that M{i) — {a, b, c, d} 
is the neighborhood of i and the l-local separator set S{i,j; G, I) = {a, b} C Af{i; G). This is because the path along c 
connecting i and j has a length greater than I and hence node c ^ S{i,j; G, I). 

Q(j/i+'^'^_ In Section 3.3, we provide examples of many graph families satisfying (16), which include 
the random regular graphs, Erdos-Renyi random graphs and small- world graphs. 

Remark: The criterion of local separation for tractable learning is novel to the best of our knowl- 
edge. The complexity of a graphical model is usually expressed in terms of its tree-width [56]. We 
note that the criterion of sparse local separation is weaker than the tree-width, i.e., r] < t, where 
t is the tree-width of the graph. In fact, our criterion is also weaker than the criterion of bounded 
local tree- width, introduced in [25]. 

3. Method and Guarantees. 

3.1. Assumptions. 

(Al) Sample Complexity: We consider the asymptotic setting where both the number of vari- 
ables (nodes) p and the number of i.i.d. samples n go to infinity. The required sample com- 
plexity is 

(17) n = n{j;^jogp). 

We require that the number of nodes p — > oo to exploit the local-separation properties of the 
class of graphs under consideration. 
(A2) Bounded Edge Potentials: The Ising model Markov on a.e. Gp ~ 9ip) has the maximum 
absolute potential below a threshold J*. More precisely, 

/^a\ tanh Jmax 1 

tanh J* 

where the threshold J* depends on the specific graph ensemble 9{p). See Section B.l for 
explicit characterization of J* for specific ensembles. 
(A3) Local-Separation Property: We consider the ensemble of graphs 9{p) such that almost 
every graph G drawn from S(p) satisfies the local-separation property (^,7), according to 
Definition 2, for some rj = 0(1) and 7 S N such that^ 

(19) Jn,i„a-^ = w(l), 

where we say that a function f{p) = uj{g{p)), if —rr^ )• 00 as p ^- 00. 



^The condition in (19) involving uj{1) is required for random graph ensembles such as Erdos-Renyi random graphs. 
It can be weakened as JminO"^ — ijj{l) for degree-bounded ensembles SDcg(A). 



(A4) Generic Edge- Potentials: The edge potentials {Jij,{i,j) G G} of the Ising model are 
assumed to be generically drawn from [— Jmax, — >^mm] U [Jminj «^max]) i-e., our results hold 
except for a set of Lebesgue measure zero. We also characterize specific classes of models 
where this assumption can be removed and we allow for any choice of edge potentials. See 
Section B.3 for details. 

Assumption (Al) provides on the bound on the sample complexity. Assumption (A2) limits the 
maximum edge potential Jmax of the model. Assumption (A3) relates the path threshold 7 with 
the minimum edge potential Jmin in the model. For instance, if Jmin = ©(1) and 7 = O(loglogp), 
we require that a := ^t^nif^r = 1 - ©(1) < 1- 

Condition (A4) guarantees the success of our method for generic edge potentials. Note that if 
the neighbors are marginally independent, then our method fails, and thus, we cannot expect our 
method to succeed for all edge potentials. Condition (A4) can be removed if we limit to attractive 
models (see Section B.3.1), or if we allow for non-attractive models, but limit to graphs with 
bounded local paths (see Section B.3. 3). For general models, we guarantee success of our methods 
for generic potentials, i.e., we establish that the set of edge potentials where our method fails has 
Lebesgue measure zero. Similar assumptions have been previously employed, e.g. in [31] where 
learning directed models is considered, it is assumed that the graphical model is faithful with 
respect to the underlying graph. 

3.2. Conditional Variation Distance Thresholding. We now propose an algorithm, termed as 
conditional variation distance thresholding (CVDT) which is proven to be consistent for graph re- 
construction under the above assumptions. The procedure for CVDT is provided in Algorithm 1. 
Denote CVDT(x";^„^p) as the output edge set from CVDT given n i.i.d. samples x" and threshold 
^n,p- The conditional variation distance test in the CVDT algorithm computes the empirical condi- 
tional variation distance in (14) for each node pair {i,j) G V"^ and finds the conditioning set which 
achieves the minimum over all sets of cardinality r]. If the minimum exceeds the threshold £,n,p, the 
node pair is declared an edge. 

The threshold ^n,p needs to separates the edges and the non-edges in the Ising model. It is chosen 
as a function of both number of nodes p and number of samples n and needs to satisfy the following 
conditions 



^ / / log p 

(20) ^n,p = O(Jmin), U,p = ^(0"^), £,n,p = il 



n 



For example, when Jmin = ^(1), a < 1, 7 = 0,{logp), n = Q.[gp\ogp), for some sequence gp = a;(l), 
we can choose ^ri 



1 



Note that there is dependence on both n and p, since we need to regularize for sample size as well 
as the size of the graph. In other words, with finite number of samples n, the empirical conditional 
variation distances are noisy and the threshold ^n,p takes this into account via its inverse dependence 
on n. Similarly, as the graph size p increases, we establish that the true conditional variation distance 
decays at a ceratin rate under assumption {A2). Hence, the threshold ^n,p also depends on the graph 
size p. Moreover, note that for all the conditions in (20) to be satisfied, the number of samples n 
should scale at least at a certain rate with respect to p, as given by (17). 

3.2.1. Structural Consistency o/CVDT. Assuming (Al) - (A4), we have the following result on 
asymptotic graph structure recovery. 

9 



Algorithm 1 Algorithm CVDT(x";^„^p,r7) for structure learning from x" samples based on em- 
pirical conditional variation distances. See 14. 

Initialize G^ = (l/,0). 
For each i,j £ V , if 

(21) ^ min .,^«U;S >Cn,p, 

SCV\{i,]} 
\S\<r, 

then add {i,j) to Gp. 
Output: dp. 



Theorem 1 (Structural consistency of CVDT) . The algorithm CVDT is consistent for structure 
recovery of Ising models Markov on a.e. graph Gp ~ S(j';^,7)-' 

(22) ^ hm^ P [CVDT ({x"}; ^„,p, r,) / G^] = 

The proof of this theorem is provided in Section B. 

Remarks: 

1. Consistency guarantee: The CVDT algorithm consistently recovers the structure of the 
graphical models, with probability tending to one, where the probability measure is with 
respect to both the graph and the samples. We extend our results and provide finite sample 
guarantees for specific graph families in Section 3.2.2. Moreover, if we require a parameter-free 
threshold, i.e., we do not know the exact value of Jmin but only its scaling with p, then we 
need to choose S,n,p = o(Jjnin) rather than ^„p = O(Jmin)- In this case, the sample complexity 
scales as n = u>{J^f^logp). 

2. Other Tests for Conditional Independence: We considered a test based on variation 
distances. Alternatively other distance measures can be employed. For instance, it can be 
proven that the Hellinger distance and the Kullback-Leibler distance have similar sample 
complexity results, while a test based on mutual information has a worse sample complexity 
of Q{J~^^logp) under the assumptions (Al)-(A4). 

3. Extension to other models: The CVDT algorithm can be extended to general discrete mod- 
els by considering pairwise variation distance between different configurations. For instance, 
we can set 



(23) iy^j;S-= Yl min^ i/(P(A,lAj = Ai,Xs = xs),P(A,jAj = A2,Xs = X5)). 

Ai,A2eA' 



In [4], we derive analogous conditions for Gaussian graphical models. Our approach is also 
applicable to models with higher order potentials since it does not depend on the pairwise 
nature of Ising models. The conditions for recovery are based on the notion of conditional 
uniqueness and can be imposed on any model. Indeed the regime of parameters where condi- 
tional uniqueness holds depends on the model and is harder to characterize for more complex 
models. Notice that our algorithm requires only low-order statistics (up to 0{r] + 2)) for any 
class of graphical models which is relevant when we are dealing with models with higher order 
potentials. 
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Proof Outline: We first analyze the scenario when exact statistics are available, (i) We establish 
that for any two non-neighbors {i,j) ^ G, the conditional variation distance in (21) (based on exact 
statistics) does not exceed the threshold ^„,p- (ii) Similarly, we also establish that the conditional 
variation distance in (21) exceeds the threshold ^n,p for all neighbors {i,j) € G. (iii) We then extend 
these results to empirical versions using concentration bounds. 

3.2.2. PAC Guarantees for CVDT. We now provide stronger results for CVDT method in terms 
of the probably approximately correct (PAC) model of learning [33]. This provides additional insight 
into the task of graph estimation. Given an Ising model P on graph Gp, recall the definition of 
conditional variation distance 

i'i\j;S ■= min iy{P{Xi\Xj = +,Xs = :xis),P{Xi\Xj = -,Xs = :xis))- 
xse{-i,+i}isi 

Given a graph Gp and A, r/ > 0, define 

(24) G'p{V;X):={{i,j)eGp: min i^,y.,s > X}, 

PIS'? 

scv\{i,j} 

(25) t'max(p; ??):=,. max min ly.ys, 

{i,MGp \S\<v 

scv\{i,j} 
For any 6 > 0, choose the threshold ^n,p as 

(26) S,n,p{S) = l^max(p; T]) + 6. 

Define, 

(27) P^i„:= min P(Xs = X5). 

scv,\s\<v+i 

x={±l}ISI 

Theorem 2 (PAC Guarantees for CVDT). Given an Ising model Markov on graph G and 
threshold S,n,p{S) according to (26), CVDT({x"};^„^p((5),?7) recovers G'p(y; z/jnax(p;^) + 2(5) for any 
6 > 0, defined in (24), with probability at least 1 — e, when the number of samples is 



2(5 + 2) 



2 



log ( - ) + (^ + 2) logp + (r/ + 4) log 2 



(28) n> , 

mi 

and the computational complexity scales as 0{p^^^). 

Proof: The proof is provided in Section C.3. D 

Thus, the above result characterizes the relationship between the separation between edges and 
non-edges (in terms of conditional variation distances) and the number of samples required to 
distinguish them. A critical parameter in the above result is i^max(p;??), the maximum conditional 
variation distance between non- neighbors. We now provide non-asymptotic bounds on VmaxiP', v) 
for specific graph families satisfying the (r/,7)-local separation condition. A detailed description 
of the graph families considered below is provided in Section 3.3. On lines of assumption (^2) in 
Section 3.1, define 

/„„N tann Jmax 

(29) a :- 



tanh J* 

As we noted earlier, the threshold J* depends on the graph family. We characterize both J* and 
i^maxip'ii]) for various graph families below. 
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Lemma 1 (Non-asymptotic Bounds on fmax(p; ij) fo^ Graph Families). The following statements 
hold for a in (29) ; 

1. For the degree-bounded ensemble SDcg(p; A)? 

(30) Jocg = OO, t'maxd'; A) = 0. 

2. For the girth-bounded ensemble 9Girth(p;5) A), 

(31) -^Girth = atanh ( ^ j , ^'max(p; 1) < «^, 

where A is the maximum degree and g is the girth. 

3. For the ensemble of A-random regular graphs SReg(p; A), 

(32) JReg = atanh ( — 

Choose any I E N such that I < 0.5(0.25pA + 0.5 — A^). Then, with probability at least 
l_A8'-2(pA-4A2-8/)-(4'-i), 

(33) i^max(p;2) <a', 

where A is the degree. 

4. For the Erdos-Renyi ensemble 9er{p,c/p), 

(34) Jer = atanh (-]. 

Choose any I G N such that I < j&^- When c > 1, then with probability at least 1 — 
leVT2Ep-2.5 _ iic^i-ip-i^ 

(35) u^^^{p;2)<4l'^a^logp, 

where c is the average degree. 

5. For the small-world graph ensemble 9wa.ttsiP,d,c/p), similar results apply: 

(36) Jwatts = atanh f - 
Choose any I € N such that I < jj^^- When c> 1, with probability at least 1 — le^^'^^p~'^'^ — 

(37) zymax(p;d + 2)<4/Vlogp, 

where c is the average degree of the Erdos-Renyi subgraph. 

Proof: See Corollaries 1 and 2 in Section B.l. D 

Thus, we note that the conditional variation distance is small for non-neighbors when the maxi- 
mum edge potential Jmax is suitably bounded. Combining the results above on I'ma.xiP', v) and the 
PAC guarantees in Theorem 2, we note that a majority of edges in the Ising model can be learnt 
efficiently under a logarithmic sample complexity. 
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3.3. Examples of Tractable Graph Families. We now show that the local-separation property in 
Definition 2 and the assumptions in Section 3.1 hold for a rich class of graphs. 

Example 1: Bounded- Degree. Any (deterministic or random) ensemble of degree-bounded graphs 
SDeg(P) ^) satisfies (r/, 7)-local separation property with rj = A and arbitrary 7 € N. This is because 
for any node i £ V, its neighborhood M{i) exactly separates it from non-neighbors. Since there 
is exact separation, we can establish that the threshold in (18) is infinite (Jj^cg = 00), i.e., there 
is no constraint on the maximum edge potential Jmax- However, the computational complexity of 
our proposed algorithm scales as 0{p ~^'^) (see also [11]). Thus, when A is large, our proposed 
algorithm, as well as the algorithm in [11], are computationally intensive. Our goal in this paper 
is to relax the bounded-degree assumption and to consider sequences of ensembles of graph S(p) 
whose maximum degrees may grow with the number of nodes p. To this end, we discuss other 
structural constraints which can lead to graphs with sparse local separators. 

Example 2: Bounded Local Paths. Another sufficient condition^*^ for the (?7,7)-local separation 
property in Definition 2 to hold is that there are at most 77 paths of length at most 7 in G between 
any two nodes (henceforth, termed as the {rj,^)-local paths property). In other words, there are at 
most rj — 1 number of overlapping^^ cycles of length smaller than 27. We denote this ensemble of 
graphs as 9lp{p]ii,i)- 

In particular, a special case of the local-paths property described above is the so-called girth 
property. The girth of a graph is the length of the shortest cycle. Thus, a graph with girth g 
satisfies (T/,7)-local separation property with r/ = 1 and 7 = 5. Let SGirth(p;5') denote the ensemble 
of graphs with girth at most g. There are many graph constructions which lead to large girth. For 
example, the bipartite Ramanujan graph [18, p. 107] and the random Cayley graphs [27] have large 
girths. Recently, efficient algorithms have been proposed to generate large girth graphs efficiently [5]. 

The girth condition can be weakened to allow for a small number of short cycles, while not 
allowing for typical node neighborhoods to contain short cycles. Such graphs are termed as locally 
tree-like. For instance, the ensemble of Erdos-Renyi graphs Ser(P;c/p), where an edge between 
any node pair appears with a probability c/p, independent of other node pairs, is locally tree-like. 
The parameter c may grow with p, albeit at a controlled rate for tractable structure learning, 
made precise later. In Section E, we establish that there are at most two paths of length smaller 
than 7 < 4^^ between any two nodes in Erdos-Renyi graphs a.a.s, or equivalently, there are no 
overlapping cycles of length smaller than 27 a.a.s. Similar observations apply for the more general 
scale-free or power-law graphs [20, 23] and we derive the precise relationships in Section E. Along 
similar lines, the ensemble of A-random regular graphs, denoted by SReg(p, A), which is the uniform 
ensemble of regular graphs with degree A has no overlapping cycles of length at most @(}og^_ip) 
a.a.s. [41, Lemma 1]. 

We now discuss the conditions under which a general local-paths graph ensemble Slp(p;^;7) 
satisfies assumption^^ (A3) in Section 3.1, required for our graph estimation algorithm CVDT to 
succeed. Denote the maximum degree for the 9hpiP]V^7) ensemble as A (possibly growing with 
p). Note that we can now implement the CVDT algorithm with parameter r/. In Section B.l, we 



^"For any graph satisfying (77, 7)-local separation property, the number of vertex-disjoint paths of length at most 
7 between any two non-neighbors is bounded above by rj, by appeahng to Menger's theorem for bounded path 
lengths [40]. However, in the definition of local-paths property, we consider all paths of length at most 7 and not just 
vertex disjoint paths. 

^^Two cycles are said to overlap if they have common vertices. 

^^In fact, a weaker version of (A3) as Jmina^"' ~ '^(l) suffices for degree-bounded ensembles SDcg(A). 
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establish that the threshold J* in (18) is given by J^p = G(l/A). When the minimum edge potential 
Jmin achieves the bound, i.e., Jmin = ©(1/^), the assumption (A3) simplifies as 

(38) /\a' = o(l). 

Note that a < 1 under (A2). We obtain a natural tradeoff between the maximum degree A and 
the path threshold 7. 

When A = 0(1), we can allow the path threshold in (38) to scale as 7 = O(loglogp). This 
implies that graphs with fairly small path threshold 7 can be incorporated under our framework. 
In particular, this includes the class of girth-bounded graph with fairly small girth (i.e., the girth 
g scaling as O (log log p)). 

We can also incorporate graph families with growing maximum degrees in (38). For instance, 
when A = O(polylogp), we require the path threshold to scale as 7 = O(logp). In particular, the 
A-random-regular ensemble satisfies (38) when A = 0(polylog|?). 

Thus, (38) represents a natural tradeoff between node degrees and path threshold for consistent 
structure estimation; graphs with large degrees can be learned efficiently if their path thresholds 
are large. Indeed, in the extreme case of trees which have infinite threshold (since they have infinite 
girth), in accordance with (38), there is no constraint on node degrees for successful recovery 
and recall that the Chow-Liu algorithm [17] is an efficient method for model selection on tree 
distributions. 

Moreover, the constraint in (38) can be weakened for random graph ensembles by replacing the 
maximum degree with the average degree. Recall that in the Erdos-Renyi ensemble Ser(P)C/p), 
an edge between any two nodes occurs with probability c/p and that this ensemble satisfies the 
(77,7) property with path threshold 7 = 0{^^) and ry = 2. In Section B.l, we establish that the 
threshold in (18) is given by J^^ = 0(l/c). Comparing with the threshold for A-degree bounded 
graphs J* = 0(1/A) discussed above, we see that we can obtain better bounds for random-graph 
ensembles. 

When the minimum edge potentials achieves the threshold (Jmin = ©(l/c), the requirement in 
assumption (A3) in Section 3.1 simplifies to 

(39) col = 0(1), 

which is true when c = 0(poly logp). Thus, we can guarantee consistent structure estimation for the 
Erdos-Renyi ensemble when the average degree scales as c = O (poly log p). This regime is typically 
known as the "sparse" regime and is relevant, since in practice, our goal is to fit the measurements 
to a sparse graphical model. 

Example 3: Small-World Graphs. The previous two examples showed local separation holds 
under two different conditions: bounded maximum degree and bounded number of local paths. The 
former class of graphs can have short cycles but the maximum degree needs to be constant, while 
the latter class of graphs can have a large maximum degree but the number of overlapping short 
cycles needs to be small. We now provide instances which incorporate both these features: large 
degrees and short cycles, and yet satisfy the local separation property. 

The class of hybrid graphs or augmented graphs [20, Ch. 12] consist of graphs which are the 
union of two graphs: a "local" graph having short cycles and a "global" graph having small average 
distances. Since the hybrid graph is the union of these local and global graphs, it simultaneously 
has large degrees and short cycles. The simplest model Swatts (p, c?) c/p) , first studied by Watts and 
Strogatz [58] , consists of the union of a d-dimensional grid and an Erdos-Renyi random graph with 
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parameter c. It is easily seen that a.e. graph G ~ Swatts (p> '^i c/p) satisfies (r/, 7)-local separation 
property in (16), with 

, , o ^ log?' 

7] = d + 2, 7 < 



41ogc 



Similar observations apply for more general hybrid graphs studied in [20, Ch. 12]. 

In Section B.l, we establish that the threshold in (18) for the small-world ensemble Swatts(Pi d, c/p) 
is given by ./-^atts ~ ©(l/c) and is independent of d, the degree of the grid graph. Comparing with 
the threshold J^^ for Erdos-Renyi ensemble 9ERiP,c/p), we note that the two thresholds are iden- 
tical. This further implies that (39) holds for the small-world graph ensemble as well. 

3.4. Explicit Bounds on Sample Complexity of CVDT. Recall that the sample complexity of 
the CVDT is required to scale as n = i7(J~j^logp) for structural consistency in high dimensions. 
Thus, the sample complexity is small when the minimum edge potential Jmin is large. On the other 
hand, Jmin cannot be arbitrarily large due to assumption (A2) in Section 3.1, which entails that 
Jmin < J* ■ The minimum sample complexity is thus attained when Jmin achieves the threshold J* . 

We now provide explicit results for the minimum sample complexity for various graph ensembles, 
based on the threshold J*. Recall that in Section 3.3, we discussed that for the graph ensemble 
9lp{p, f]-! 7) ^) satisfying the (r/, 7)-local paths property and having maximum degree A, the thresh- 
old is Jlp = 1/A. Thus, the minimum sample complexity for this graph ensemble is n = ri(A^ logp) 
i.e., when Jmin = 0(1/A). 

For the Erdos-Renyi random graph ensemble Ser(P; c/p) and the small-world graph ensemble 
Swatts (Pi d, c/p) , recall that the thresholds are given by J^j^ = J^atts ~ -'-/'^' where c is the mean 
degree of the Erdos-Renyi graph. Thus, the minimum sample complexity can be improved to n = 
J7(c^ logp), by setting Jmin = 0(l/c). This implies that when the Erdos-Renyi random graphs and 
small-world graphs have a bounded average degree (c = 0(1)), the minimum sample complexity is 
n = J7(logp). Recall that the sample complexity of learning tree models is ^(logp) [53]. Thus, we 
observe that the complexity of learning sparse Erdos-Renyi random graphs and small-world graphs 
using our algorithm CVDT is akin to learning tree structures in certain parameter regimes. 

3.5. Comparison with Previous Results. We now compare the performance of our algorithm 
CVDT with ^i-penalized logistic regression proposed in [48]. We first compare the computational 
complexities. The method in [48] has a computational complexity of O(p^) for any input (assuming 
p > n). On the other hand, the complexity of our method depends on the graph family under con- 
sideration. It can be as low as 0{p^) for girth-bounded ensembles, O(p^) for random graph families, 
and as high as 0{p ) for degree-bounded ensembles (without any additional characterization of the 
local separation property). Clearly our method is not efficient for general degree-bounded ensembles 
since it is tailored to exploit the sparse local-separation property in the underlying graph. 

We now compare the sample complexities under the two methods. It was established that the 
method in [48] has a minimum sample complexity of n = Q{A^logp) for a degree-bounded ensem- 
ble SDcg(PjA) satisfying certain "incoherence" conditions. The sample complexity of our CVDT 
algorithm is better at n = Q{A.'^ logp). Moreover, we can guarantee improved sample complexity of 
n = ^{c^ logp) for Erdos-Renyi random graphs 9er{p, c/p) and small- world graphs Swatts(P) d, c/p) 
under the modified CVDT algorithm. Note that these random graph ensembles have maximum de- 
grees (A) much larger than the average degrees (c), and thus, we can provide stronger sample 
complexity results. Moreover, our algorithm is local and requires only low-order statistics for any 
class of graphical models of arbitrary order, while the method in [48] requires full-order statistics 
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since it undertakes neighborhood selection through regularized logistic regression. This is relevant 
in practice, since our algorithm is better equipped to handle missing samples. 

The incoherence conditions required for the success of ii penalized logistic regression in [48] are 
NP-hard to establish for general models since they involve the partition function of the model [6]. 
In contrast, our conditions are transparent and relate to the phase transitions in the model. It is an 
open question as to whether the incoherence conditions are implied by our assumptions or vice- versa 
for general models. It appears that our conditions are weaker than the incoherence conditions for 
random-graph models. For instance, for the Erdos-Renyi model Ser(Pi c/p), we require that Jmax = 
0(l/c), where c is the average degree, while a sufficient condition for incoherence is Jmax = 0(1/'^)) 
where A is the maximum degree. Note that A = O(logplogc) a.a.s. for the Erdos-Renyi model. 
Similar observations also hold for the power-law and small-world graph ensembles. This implies that 
we can guarantee consistent structure estimation under weaker conditions (i.e., a wider range of 
parameters) and better sample complexity for the Erdos-Renyi, power-law and small-world models. 

4. Necessary Conditions for Graph Estimation. We have so far proposed algorithms and 
provided performance guarantees for graph estimation given samples from an Ising models. We now 
analyze necessary conditions for graph estimation. 

4.1. Erdos-Renyi random graphs. Necessary conditions for graph estimation have been previ- 
ously characterized for degree-bounded graph ensembles 9Deg{p, A) [50]. However, these conditions 
are too loose to be useful for the ensemble of Erdos-Renyi graphs Ser(p, c/p), where the average 
degree^^ (c) is much smaller than the maximum degree. 

We now provide a lower bound on sample complexity for graph estimation of Erdos-Renyi graphs 
using any deterministic estimator. Recall that p is the number of nodes in the model and n is the 
number of samples. In the following result, c is allowed to depend on p and is thus more general 
than the previous results. 

Theorem 3 (Necessary Conditions for Model Selection). Assume that c < 0.5p and Gp ~ 
Ser(PiC/p). Then if n < eclogp for sufficiently small e > 0, we have 

(40) hm p[g;{x;) / g^] = i 



for any deterministic estimator G 



p- 



Thus, when n < eclogp for sufficiently small e > 0, the probability of error for structure estima- 
tion tends to one, where the probability measure is with respect to both the Erdos-Renyi random 
graph and the samples. The proof of this theorem can be found in Section D, and is on lines of [11, 
Thm. 1]. 

The result in Theorem 3 provides an asymptotic necessary condition for structure learning and 
involves an additional auxiliary parameter e. In the following result, we remove the requirement for 
the auxiliary parameter e and provide a non-asymptotic necessary condition, but at the expense of 
having a weak (instead of a strong) converse. 

Theorem 4 (Non- Asymptotic Necessary Conditions for Model Selection). Assume that G ^ 
Ger{p,c/p), where c may depend on p. Let Pe '■= P{Gp 7^ Gp) he the probability of error. If 



^^The techniques in this section is apphcable when the average sparsity parameter c of Ser(p, c/p) ensemble is a 
function of p and satisfies c < p/2. 

16 



(p) 

Pe — )• 0, the number of samples n must satisfy 

(41) n> \ (P]n,{-) 

By expanding the binary entropy function Tibi-), it is easy to see that the statement in 41 can 
be weakened to the more easily interpretable (albeit weaker) necessary condition: 

/ - N c logo p 

(42) n > 2^ 

The above result differs from Theorem 3 in two aspects: the bound in (41) does not involve any 
asymptotic notation and is a weak converse result (instead of a strong converse). 

Remarks: 

1. Thus, n = Q{clogp) number of samples are necessary for structure recovery. Hence, larger 
the average degree, higher is the required sample complexity. Intuitively this is because as c 
grows, the graph is denser and hence, we require more samples for learning. In information- 
theoretic terms, Theorem 3 is a strong converse [21], since we show that the error probability 
of structure learning tends to one (instead of being merely bounded away from zero). On the 
other hand, the result in Theorem 4 is a weak converse result. 

2. In [50], it is shown that for graphs uniformly drawn from the class of graphs with maximum 
degree A, when n < eA'^logp for some A: G N, there exists a graph for which any estimator 
fails with probability at least 0.5. These results cannot be applied here since the probability 
mass function is non- uniform for the class of Erdos-Renyi random graphs. 

3. The result is not dependent on the Ising model assumption, and holds for any pairwise discrete 
Markov random field (i.e., A' is a finite set). 

We now provide an outline for the proof of Theorem 4. A naive application of Fano's inequality 
for this problem does not yield any meaningful result since the set of all graphs (which can be 
realized by Ser) is "too large" . We employ another information-theoretic idea known as typicality. 
We identify a set of graphs with p nodes whose average degree is e-close to c (which is the expected 
degree for Ser(p, c/p). The set of typical graphs has a small cardinality but high probability when 
p is large. The novelty of our proof lies in our use of both typicality as well as Fanos inequality 
to derive necessary conditions for structure learning. We can show that (i) the probability of the 
typical set tends to one as p — )• oo, (ii) the graphs in the typical set are almost uniformly distributed 
(the asymptotic equipartition property), (iii) the cardinality of the typical set is small relative to 
the set of all graphs. A detailed discussion of these techniques is given in [4]. 

4.2. Other Graph Families. We now provide necessary conditions for recovery of graphs belong- 
ing to various graph ensembles considered in this paper. We first recap the results of [11, Thm .1] 
which is applicable for any uniform ensemble of graphs. 

Theorem 5 (Lower bound on sample complexity). Assume that a graph Gp on p nodes is 
uniformly drawn from an ensemble S. Given n i.i.d. samples from an Ising model Markov on G, 
we have 

lynp 

(43) P[G^(X^) ^ GJ > 1 - — 

for any deterministic estimator Gp. 
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We provide bounds on the number of graphs in specific graph famiUes considered earher in the 
paper which gives us necessary conditions for their recovery. 

Lemma 2 (Bounds on Size of Graph FamiUes). The following bounds hold: 

1. For girth-hounded ensembles ScirthCPiS'i ^miii) ^max) ^) with girth g, minimum degree Amin, 
maximum degree Amax cLnd number of edges k, we have 

(44) /(p - 5AU)^ < |gGirth(p; 5, A^in, A,„ax, k)\ < p\p - A^;J^ 

2. For local-path ensembles gLp(j';??,7, Amin, Amaxi ^) having t] paths of length less than 7 > 
between any two nodes, minimum degree Amin > 0, maximum degree Amax o-iT'd number of 
edges k, 



mip ^ [p — 7A2 



mm I 

2 



< |Slp(p; 1], 7, Amin, Amax, k)\ 



7 Nte/lAiax^^ -*- 



(45) <m2p'Hp-Kinr{'' 2) 

where ki := k — m2(ry — 1), k2 := k — mi{rj — 1), mi :- 



7A2 



and 771-2 



3. For augmented ensembles 3x^g{p;d,r],j,A^[^,A^g^^,k) consisting of a local graph with (reg- 
ular) degree d and a global graph Slp(p;^)7) Amin, Amax, ^); w^ have 



"''^ip-l^l..f^{^f-'"~'''-'' 



mip'^i(p-7A^axJ'^H r"J [^d ) ^ |SAug(p;d,ry,7,Amin,Amax,A;)| 
(46) < m,p''^{p-Alj'''^C^I->^)'^'r/), 

where k'l := /ci + 1 — ^ and k'2 := k2 + 1 — ^, for ki, k2,'mi,m2 defined previously. 

The proof of the above result is given in Section D.2. 

Remarks: Using the above results on lower bounds on the number of graphs in a given family, 
in conjunction with Theorem 5, we can obtain necessary conditions for different graph families. 
For instance, for girth-constrained families, when the girth g and maximum degree A^ax scale as 
O(polylogp), we have that 



(47) n = n 



-iogp 
P 



number of samples is necessary for structure estimation, where k is the number of edges. Simi- 
larly, for local path ensembles, when the path threshold 7 and maximum degree Amax scale as 
O(polylogp), the above bound in (47) changes only slightly, and we have 



n = Q 



k r] — 1 



p 



A 



Iogp 



as the necessary condition, by substituting for ki, and noting that the other terms scale slower than 
Iogp under the above specified regime. Similarly, for augmented graphs, we have 



n = Q 



k r] — 1 
p 



d 



A^. 2 

min 



logp 



as the necessary condition. Thus, for a wide class of graphs, we can characterize necessary conditions 
for structure estimation. 
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5. Conclusion. In this paper, we adopted a novel and a unified paradigm for Ising model 
selection. We presented a simple local algorithm for structure estimation with low computational 
and sample complexities under a set of mild and transparent conditions. This algorithm succeeds 
on a wide range of graph ensembles such as the Erdos-Renyi ensemble, small-world networks etc. 
based on a local separation criterion. 
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APPENDIX A: PRELIMINARIES AND TOOLS 

Notation. For any two functions f{p),g{p), f{p) = 0{g{p)) if there exists a constant c such that 
f{p) < cg{p) for all p > po for a fixed po G N. Similarly, f{p) = i^{g{p)) if there exists a constant c' 
such that f{p) > c'g{p) for all p > po for a fixed po € N, and f{p) = Q{g{p)) if f{p) = Cl{g{p)) and 
f{p) = 0{g{p)). Also, f{p) = o{g{p)) when f{p)/g{p) -^ and f{p) = u){g{p)) when f{p)/g{p) -^ oo 
as p — )• oo. We use the notation f{p) = 0{g{p)) if f{p) < cg{p)logp, for some constant c and for 
ah p > Po. Similarly, we have f{p) = oo{g{p)), if ^(^f^ ^ oo and f{p) = o{g{p)) if ^^^^^^ ^ 0, 
as p — )• oo. 

For a graph G, let v{G) denote the vertex set of G. Let M{i) denote the neighbors of node 
i and N[i\ denote the closed neighborhood, i.e., including node i as well. We let Path(i,j;G) = 
Pathi(z,j;G) denote the subgraph spanning the corresponding shortest path and d{i,j;G) := 
I Path(i, j; G)\ denote the graph distance or the shortest path distance between nodes i and j. Let 
the set of nodes at distance^^ exactly I from i in G be denoted as 

(48) Bi{i;G):={keV ■.d{i,k;G)=l}. 

Let Path.i{i,j; G) denote"*^^ the T*" shortest path from i to j and di{i,j; G) the corresponding length 
of the path. Let Nf^^^{i,j; G) denote the number of paths of length / from node i to node j in G 
without repeating any node in the intermediate steps. 

Denote the correlation between any two variables Xi and Xj , i,j E I^ as 



(49) C{i,j):=nX,X_ 



3\ 



Given n samples x^, x"^ drawn i.i.d. from Xi, Xj, let C{i,j; x'^ ,x^) denote the empirical correlation 
between node i and j is defined as 



^ ^ 1 " 

(50) Qj- := C{i,j;Xi ,Xj) := — / ^ Xj^kXj 



n 
k=l 

^*We follow the convention that if I is not an integer, the distance is [I] . 
^^We abbreviate Pathi(j, j;G) as Path(i, j;G) and di{i,j;G) as d{i,j;G). 



19 



For any distributions P,Q on a finite alphabet X, recall that J^(i-*, Q) denotes the total variation 
distance, given by 



(51) uiP, Q) := hp - Q\\, = IY1 I^(^) - Q(^)l 



2 

A.l. Analysis of Ising Models on Trees. We first derive simple expressions for Ising models 
Markov on trees. This will be later used upon reduction of general models to tree models via self- 
avoiding walk-tree construction. We first note the correlation between any two node pairs on a tree 
model. 

Fact 1 (Markov Property for Correlations on a Tree). For a symmetric Ising model (h = 0) 
Markov on a tree T , the correlation is given by 

(52) C{i,j;T)= H C{k,l;T), yi,j€V, 

(fe,OePath(iJ;T) 

and the correlation between any two neighbors is, 

(53) Cii,j;T) = tanh(Ji,,), V(i,i) G T. 

Proof: Eqn. (52) is obtained by successive conditioning on the intermediate nodes in the path 
between i and j in the tree T. Eqn. (53) is a consequence of the form of the symmetric Ising model. 

D 

Given an Ising model P Markov on G, define a corresponding model P obtained by setting all 

the node potentials hi to zero and all the edge potentials Jij to their corresponding absolute values 

\Jij\- We term P as the corresponding symmetric attractive model for P. We make the following 

observation. 

Proposition 1 (Dominance by Symmetric Attractive Model on Trees). For an Ising model P 
Markov on a tree T and for P its corresponding symmetric attractive model, we have 

(54) \\P[Xi\Xj =+;T]- P[Xi\Xj = -;T]\\, < \\P[Xi\Xj = +;T] - P[X,\Xj = -■,T]\\, 

Proof: The proof is along the lines of [7, Lemma 4. 1] , but we make the simple observation that 
it also holds when the model P is not necessarily attractive (or ferromagnetic). 

We first note that it suffices to show (54) for the special case when P is a Markov chain on 
k + 2 variables, for some A; G N , i.e., the tree T is a path graph T = i,l, . . . ,k,j with i and j 
as endpoints. This is because we can reduce the conditional probability P[Xi\Xj;T] on any tree T 
to a corresponding conditional probability on the path from i to j by suitably modifying the node 
potentials. See [7, Lemma 4.1] for details. 

We now show that (54) holds when the tree is a path T^ := i, 1, . . . , k,j, for all /c G N, by doing 
an induction on k. For k = 1 (path of length two), we have^^ 

\\P[X,\Xj = +;ri] - P[X^\Xj = -;ri]|ji 



P'Ji,li'^i _}_ p "i.l '^i p 'Jifli'^i _|_ p'-'i,!. '^i 



i^Note the simple fact that \\P[Xi\Xj = +] - P[Xi\Xj = -]\\^ = |E[X,|Xj = +]-E[X,\Xj = -]|. The result in [7, 
Lemma 4.1] is expressed in terms of expectations. 
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= I tanh(Jj^i + hi) + tanh(Jj^i — hi)\ 

(55) = (tanh(|Ji,i| + /ii) + tanh(|Jj,i[ -hi)) 

< \\P[Xi\Xj = +;T] - P[X,\Xj = -,T]\\,. 

The expression in (55) has a unique maximum when /ij = and thus, the subsequent inequahty. 
The induction step on k now proceeds as in [7, Lemma 4.1], and we have the result. D 

A. 2. Self- Avoiding Walk Tree Construction. We now review the notion of a self-avoiding 
walk (SAW) tree for graphical models with binary variables, first introduced in [60]. Given an Ising 
model Markov on a general graph G and a particular node i £ V, the corresponding SAW tree 
rooted at i is denoted by Tsaw(^; G)- It is essentially the tree of self-avoiding walks originating from 
node i, except that whenever a cycle in G is closed by the walk, a terminal node is included in 
Tsaw{i',G) and is fixed to be either +1 or —1; the actual value is determined by the direction in 
which the cycle is traversed by the walk (for instance, by convention, we can fix terminal nodes upon 
clockwise traversal of cycles as +1). Let A denote the set of all terminal nodes in Tsaw(^;G') and 
XA, the corresponding fixed configuration. In effect, Tsa.viii]G) involves conditioning with respect 
to the terminal nodes A. See Fig. 2 for an illustration. 

We now recap a powerful result of [60] that Tsaw(^; G) preserves the marginal and conditional dis- 
tributions of node i with respect to the original graph G. Recall that Nf^^^{i, Q; G) = X^geQ Xf^^^{i, q; G) 
denotes the number of paths of length / from i to a set Q C F in G, d{i, Q; G) = miuggQ d{i, q; G) 
denotes the graph distance, and S{i, Q; G) = UqgQ5(i, q; G) denotes a vertex separator between i 
and Q in G. Let 

(56) U{j;Ts^^{i;G)) = {ji, ■ ■ ■ ,j\u{j;T,,^{i-G))\} C v{Ts^^{i;G)) 

denote the set of copies of a node j ^ i in the self-avoiding walk tree Tsaw(^; G). The definition is 
extended to sets Q CV as U{Q; Tssm{i; G)) := UggQW(g; Tgaw(«; G)). 

Theorem 6 (Properties of Tsawi^', G)). The following properties hold for the self-avoiding walk 
tree Tsa.^{i;G) 

1. The marginal and conditional distributions of node i are preserved 

(57) P{xi; G) = P(x,|xa; r,aw(i; G)) 

(58) P{xi\:x.Q;G) = P{xi\jcn,^Q),:x.A;Ts^,„{i;G)), 

for a fixed configuration x^ on the set of terminal nodes A, and for any set Q cV \ {i}. 

2. The paths in G from node i to any set V are preserved in Tsaw(^; G): 

(59) iVf^*^i,Q;G)=iVf^*^i,ZY(g);rsaw(i;G)), yi e^, Q cV \{t}. 

3. The graph distances from node i in G and Tsaw{i', G) are equal: 

(60) d{i,Q;G)=d{i,U{Q);Ts^.A^G)), VQ C F \ {i}. 
4- The cardinality of the vertex separators are preserved: 

(61) \S{i,Q;G)\ = \SiiMiQ);TsUi;G))\, VQ C F \ {i}. 

5. The maximum degrees in G and Tsaw(^;G) are equal. 
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(a) A graph 




di 



(b) Its Self-Avoiding Walk Tree 



Fig 2. The figure on the right is self-avoiding walk tree rsaw(i;G') rooted at node i for the graph shown in the left. 
The set U{j) is the set of copies of node j and the set A is the set of terminal nodes in Tsi,^{i; G). 



Proof: Property (1) is proven in [60]. It involves a recursive expression for marginal and conditional 
distributions of node i. Property (2) holds by definition since Tsiavii',G) is constructed by self- 
avoiding walks from node i. Properties (3), (4) and (5) depend only on the paths in the graph and 
are thus preserved. D 

Thus, for any graph G, we have a tree-representation Tsa,-w(i; G) which preserves many properties 
with respect to node i. However, in general, the tree Tsaw(^;G) can have exponential number of 
nodes (compared to G) and thus, we cannot use Tsaw(^; G) directly. This is also true for the class of 
graphs considered in this paper. However, the bound on maximum edge potentials and conditioning 
on local separators allows us to limit the neighborhoods under consideration. 

We note the following property of graphs with local-paths property. Recall that a graph ensemble 
Slp(p;'?)7) satisfies (ry, 7)-local paths property if there are at most rj paths of length less than 7. 

Lemma 3 (Neighborhood Size of Tsawl^jG) for Graphs with Local-Paths Property). For a.e. 
G ^ Slp(p;^>7) satisfying the (rj,^) -local paths property as per Definition 2, we have 

(62) \Bi{i;T,,^{i;G))\<7]\Bi{i;G)\, yi<-f. 

Proof: Recall that a.e. G ~ Slp(j';^)7) has at most tj paths of length smaller than 7 between 
any two nodes. This implies that there are at most r] copies of any node j 7^ i in Tsawii]G) and 
at most r] number of terminal nodes A, which are at distance at most 7 from i in Tsaw(^; G) using 
Property (2) in Theorem 6. Thus (62) holds. D 

APPENDIX B: CONDITIONAL VARIATION DISTANCE TEST 

B.l. Conditional Uniqueness Regime. We now characterize a sufficient condition for struc- 
ture estimation of Ising models and term it as the conditional uniqueness regime. In Section B, we 
will see that Definition 3 leads to structural consistency of the proposed CVDT algorithm. We use 
the term "conditional uniqueness regime", since it is similar to the so-called uniqueness regime^ , 



^^Roughly, the uniqueness condition states that asymptotically, as the number of variables p —>■ 00, any marginal 
distribution of variables in a local neighborhood of the graph is asymptotically independent of faraway variables. 



Refer to [28, 43] for details 
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but involves the conditional distributions instead of marginal distribution. Our condition stated 
below, is in fact, a weaker condition than the usual notion of the uniqueness regime. 
Notations: Given a graph G = {V, E) and a graphical model P Markov on G, and any subset 
A d V, let P[Xa]G] denote the marginal distribution^^ of variables in A. Recall that d{i,j]G) 
denotes the graph distance, Bi{i; G) denotes the set of nodes within graph distance I from node i 
and dBi{i) denotes the boundary nodes, i.e., nodes exactly at / from node i. 

Definition 3 (Conditional Uniqueness Regime). A discrete graphical model P Markov on 
graph G ~ 9(p) is in the conditional uniqueness regime if there exists a € (0, 1) such that for a.e. 
G and all I gN such that^'^ , 

(63) max \\P[X,\Xj = +,Xs, = X5J - P[Xi\X, = -,Xs, = X5JII1 = Oia^), 

where Si := S{i,j;G,l) is the minimal l-local separator between i and j, according to Definition 1. 

We now show that a sufficient condition for the conditional uniqueness condition in (63) to hold 
for Ising models is for the maximum absolute edge potential to satisfy 

(64) Jmax<J*, 

where the threshold J* € M^ is the largest value which satisfies^'', for all / G N, 

(65) max|aB,(i;rsaw(i;i^s,))l =0(tanhJ*)-', 

where F'g := G(y \ Si) is the subgraph of G obtained by removing the nodes in Si, the minimal 
/-local separator and Tsa,w{i'-, F's ) is the corresponding self-avoiding walk tree rooted at i. Define 

(66) a:=^-^^^^<l. 
^ ' tanh J* 

We now characterize J* in terms of the self-avoiding walk tree. 

Lemma 4 (Sufficient Conditions for Conditional Uniqueness via Tsa,w{i',G))- The Ising model 
satisfying (64) is in the conditional uniqueness regime according to (63) with rate a given by (66), 
where the threshold J* is given by (65). 

Proof: Abbreviate the /-local separator, S := S{i,j;G,l). We have, for i (^V, 

\\P[Xi\Xj = +,Xs = X5] - P[Xi\Xj = -,X5 = xsllli 
= ||P[Xj|X^(j) = +,X.u(^s) = xw(s),Xa = XA;rsaw(«;G')] 

(67) - P[Xi\Xu(^j) = -,X2^(5) = XiY(s),XA = XA;Tsaw(«;G)]||i 



^*In the sequel, we abuse notation by using P[Xi; G] to refer to the vector of length \X\ containing the values of 
the pmf Pxi]G- 

^®In Definition 3, we let I scale as a function of p, albeit under some restrictions depending on the graph ensemble. 
See Corollary 1 for some examples. 

^"in (65), we let I scale as a function of p, albeit under some restrictions depending on the graph ensemble. This 
implies that Definition 3 is satisfied for these regimes of I. See Corollary 1 for some examples. 
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from Property (1) in Theorem 6 for self-avoiding walk trees, for a certain configuration xa over the 
set of terminal nodes A. 

Recall that U{j;Tssw{i;G)) denotes the set of copies of node j in Tssmi^G)- Recall that in 
Tssmii] G) , each path starting from root node i has exactly one copy of nodes in 5 U {j} (if the node 
is encountered again, a terminal node is added to Tsaw(*;G')). Denote the set Ui{j;Tsiavii',G)) C 
U{j] Tsaw(^; G)) as the set, where copies of node j are encountered first before encountering the copies 
of nodes in S, along the paths from i in Tssmi^G). Similarly Ui{S;Tssm{'i']G)) C U{S;Tsaw{i',G)) 
denotes the set encountered before the copies of j. Let h(2{j;Tsa,v/{v,G)) := U{j;TsaM{i;G)) \ 
^i(j; Tsi,^{i; G)) and ^^2(5; TsawC^; G)) is defined similarly. See Fig.3. By definition, Xj^2(j) -X.n^(^s) - 
Xi — ^Uiij) ~ ^U2{S) forms a Markov chain, and thus, 

P{Xi\^u(j)^^u{s)^^A;Tss,w{i; G)) = ^(XjIX^^^q-), X^j(5), X^; Tss,w{i; G)). 

Substituting this equivalence into (67), we have 

\\P[X,\Xj = +,Xs = X5] - P[X,\Xj = -,Xs = xs]||i 

- ^[^i|Xj^j(j) = -,X^j(5) = Xj^j(5),Xa = XA;Tsaw{i;G)]\\^ 
<\\P[Xi\Xu,ij) = +;T,aw(i;G)] - P[X,|X^,(,-) = -,T,,^{r,G)]\\„ 

(b) _ 

<||P[X,|X^,(,) = +;T,Ui;F's^)]-P[X,\Xu,^^) = -,T,Ui;FsMv 
<\\P[Xi\X9B,i^) = +;T,aw(^;i^s,)] " P[Xi\:^9B,{i) = -;T,,Ur, P'sMi 

(d) 
<2\dBi{i;T,^^ii;F's^))\{tanhJ^.^J, 

where Inequality (a) is obtained by applying Proposition 1 and involves the symmetric attractive 
counterpart P of P, obtained by setting all the node potentials hk = for all k £ u(Tsaw(^; G)). 
Note that conditioning on a random variable Xk to be + (resp. — ) is equivalent to setting its node 
potential hj to oo (resp. — oo) and erasing the sub-tree beyond node k. Thus dropping conditioning 
and setting the node potential to zero forms an upper bound in (a). 

For Inequality (b), note that in rsaw(^; G), the paths from node i, to V(i{j) and Ui{S) are disjoint 
(except for node i). Thus, the conditional distribution of Xi conditioned on Ui{j) on Tsaw(^;G') 
is equivalent to a conditional distribution on Tsaw{i',Ps ) obtained by marginalizing out the nodes 
corresponding to paths containing Ui{S) and suitably changing the node potential of node i. (See 
[22, Lemma 4.1] for an exact characterization of such a marginalization) . Applying Proposition 1, 
we have an upper bound by setting the node potential in Tsaw(^;-p5 ) to zero, i.e., given by the 

model P on rsaw(i; F's^)- 

For Inequality (c), recall that by definition of a /-local separator, the set Ui{j) has distance at 
least I from node i. Thus, Xi — 'X.q^^u) — 'X.^^^j-^ forms a Markov chain and in an attractive model 
P, the inequality (c) holds. 

Inequality (d) involves considering a telescoping sum of a sequence of configurations A'^, . . . , A' '*^*^' 
on dBi{i) from all + configuration to all — configuration, where the difference between the vec- 
tors A* and A*"^^ is in a single coordinate, i.e., the configuration at a single node is changed while 
keeping the others fixed. See [45, Lemma 2.8] for detailed discussion of this step. In particular, by 
applying Proposition 1, for each term involving A* and A*"*"^, the conditioning on other nodes can 
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Fig 3. Illustration of sets on Tsav,{i;G), the self-avotding walk tree at node i corresponding to the graph m Fig.l and 
let S = {a, &} in the graph in Fig.l. The nodes ji,J2 and js are the copies of j in T'saw(«; G) and similarly for nodes 
in S. The set A is the set of terminal nodes in rsaw(i; G). The set Wi(j) separates U2(S) from i and viceversa. 



be dropped and we have 

\\P[X,\y.aB,i^) = A - P[X^\'y^^B^ii) = A'+']||i < 2(tanh J^ax)'- 

Collecting all the terms we have inequality (c), since there are \dBi{i)\ number of terms. By defi- 
nition of J* in (64), we have that 

\dBi{i)\ = 0(tanhJ*)-^ 

Now substituting a in the above equation using (66), we have the result. D 

We can now obtain the threshold J* for specific graph ensembles using the above result. Re- 
call that SDeg(p>^) denotes a graph ensemble with maximum degree A, Ser(Pic/p) denotes the 
Erdos-Renyi ensemble, where an edge between any two nodes occurs with probability c/p and 
Swatts (P) c?, c/p) denotes the small- world graph, which is the union of a d-dimensional grid and an 
Erdos-Renyi graph with parameter c. Recall that a := ^ ^anh j*" • ^^ have the following result. 

Corollary 1 (Threshold J* for Deterministic Graph Families). We have the following results 
for various graph families: 

1. For any graph ensemble SDcg(P)A) with maximum degree A, (63) holds for all I and (65) 
simplifies to 

(68) Joeg = oo. 

In particular, for every Ising model Markov on a IS.- degree hounded graph, 

(69) max \\P[X,\Xj = +,Xs = X5] - P[X^\Xj = -,Xs = x^lHi = 0, 

where S is the exact separator between i and j. 
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2. For the girth-hounded ensemble SGirth(p;5) ^)) when 21 < g, the threshold for (63) is given by 

(70) JGirth = atanhf — j . 

In particular, in this regime, every Ising model Markov on a graph G € 9Girth{p',g, ^) satisfies 

(71) max ||P[X,|X, = +,X5, =xsJ-P[X,[X,- = -,Xs, =xsj||, <2a^ 

when 21 < g, where g is the girth of the graph, and Si := S{i,j;G,l) is the minimal l-local 
separator between i and j and satisfies \Si\ < 1. 

We provide probabilistic bounds for random graph families. 

Corollary 2 (Threshold J* for Random Graph Families). We have the following results for 
various graph families: 

1. For the random-regular graphs 9Rog(p, A), (63) is satisfied when I = 0(logA-if')j A = 
O(polylogp), the threshold is given by 

(72) JReg = atanh ( ^ j • 

In particular, in this regime, for every Ising model Markov on a A-random regular graph, 
when I < 0.5(0.25pA + 0.5 - A^), with probability at least 1 - A^'"2(pA - AA"^ - 8/)"(^'"^\ 
we have 

(73) max \\P[Xi\X, = +,Xs, = ^s,] - P[X^\Xj = -,Xs, = ^sMi < '^c,^ 

xs^ SAT I Si I 

where Si := S{i,j;G,l) is the minimal l-local separator between i and j and satisfies \Si\ < 2. 

2. For both the Erdos-Renyi ensemble 9er{p, c/p) and the small-world graph ensemble Swatts(P) d, c/p), 
(63) holds when I < 4^^ and c = O(polylogp) with thresholds given by 

(74) Jer = «^Watts = atanh ( - 

In particular, in this regime, when I < 4^^ and 1 < c = O(polylogp), with probability at 
least 1 — le^^'^^p~'^'^ — /!c^'~^p~^, we have 

(75) max \\P[Xi\X, = +,Xs, = xsj - P[X,\Xj = -,Xs, = xsj||i < SPahogp, 

and Si := S{i,j;G,l) is the minimal l-local separator between i and j and satisfies \Si\ < 2 
for the Erdos-Renyi ensemble 9er{p,c/p) and \Si\ < d + 2 for the small-world graph ensemble 
9wa.ttsip, d, c/p). 

Remarks: 
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1. Comparing (68), (70) and (72), we note that for the degree-bounded ensemble Jj^ = cxd 
meaning that we do not place any restrictions on the maximum potential Jmax, while for 
the girth bounded ensemble and the random regular ensemble JQi^h — "^Reg ~ ^/^- This is 
because the minimal /-local separators are different for these two ensembles. For SDeg{p,A), 
it has cardinality A and thus, forms an exact separator. On the other hand, for SGirth(p; 9, ^) 
and SReg(P)A), the minimal /-local separators have cardinalities 1 and 2 when 21 < g and 
I = 0{log^_ip) respectively, and thus, do not form an exact separator. Thus, the threshold 
J* depends on whether exact or approximate separators are used for conditioning. 

2. Comparing the thresholds for random regular ensemble in (72) and the Erdos-Renyi ensemble 
in (74), we see that J^j^ ^ JRes^ i^ ^^ constrain the maximum degrees in the two ensembles 
to be the same. Recall that the maximum degree of the Erdos-Renyi ensemble is a.a.s. A = 
(log p log c/ log log p). Thus, by obtaining the threshold J^^ in terms of the average degree c 
instead of the maximum degree, we have a larger threshold and thus, can provide guarantees 
for structure estimation of Erdos-Renyi graphs for a wider regime of edge potentials. 

3. Comparing the thresholds for the Erdos-Renyi ensemble 9ERip,c/p) and the small-world 
ensemble Swa.tts{p, d,c/p) in (74), we see that J|.j^ = J^atts' but note that the minimal l- 
local separators are different for these two ensembles. For the Erdos-Renyi ensemble, it has 
a cardinality of two when I < ^°^^ , as discussed above. For the small- world ensemble, which 
is the union of a d-dimensional grid and an Erdos-Renyi graph, the minimal /-local separator 
has a cardinality of d+2 when / < ^°£f^ and it forms an exact separator on the grid. Thus, for 
the small-world graphs, we require a threshold J^atts such that the long paths on the Erdos- 
Renyi subgraph has a decaying effect, leading to the same threshold on the edge potentials 

("^Watts — ^Er)- 

Proof: The result in Eqn. (68) is from the definition of graphical models: the size of the minimal 
/-local separator for Sucgip, A) ensemble is of size A for all / € N. This implies that Tsa.w{i', F's ) has 
no edges and thus, J^ is infinite. 

The result in Eqn. (70) is obtained from the fact that the /-local separator is of size 1 when 21 < g 
since we do not encounter any cycles. In this case, we can bound the neighborhood of Tsa.vi{i',Fg ) 
via rsaw(^; G) and using Property (5) in Theorem 6, we have the result. 

For the result in Eqn. (72), note that the size of minimal /-local separator for 9Rcg(j'5 A) is 1, 
when / = 0{\og^_ip) [18, p. 107]. In this case, we can bound the neighborhood of Tsaw(^;-/^5 ) 
via Tsaw(^;G') and using Property (5) in Theorem 6, we have the result. For the result in (73), we 
appeal to [41, Thm. 3] and derive the probability of two cycles each of length at most / overlapping 
with one another. 

For the result in (74), we appeal to [19, Lemma 1] that with probability at least 1 — /e^^^^p^^'^, 
for all / G N, when c > 1, 

(76) max \Bi{i)\ < 2l^c^ logp. 

When / < ^°£f^ , with probability at least 1 — /!c^'+^p~^ [3, Lemma 3], there is at most one cycle in 

Bi{i) for all i gV. From Lemma 3, we have the result. When c = 0(poly logp), we have [^|^ = w(l), 
and thus J^^ holds. 

For the small- world graph ensemble 9watts{p, d, c/p), which is the union of the d-dimensional grid 
and Erdos-Renyi graph, the size of the minimal /-local separator is d + 2, when / < .?^^, . Since Fsj 
is dominated by the Erdos-Renyi graph, the result holds. D 
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B.1.1. Uniqueness Regime. We now relate the conditional-uniqueness regime to the well-known 
notion of the uniqueness regime^^ of an Ising model. 

Intuitively, in the uniqueness regime, as the number of nodes p — > oo, any marginal distribution of 
variables in a local neighborhood of the graph is asymptotically independent of faraway variables. 
We formally define it below. Recall that we say f{p) = 0{g{p)) if f{p) < Mg{p)logp for some 
constant M and p > po and Fi{i; G) denotes the spanning subgraph of the /-hop neighborhood of 
node i. 

Definition 4 (Uniqueness Regime). A discrete graphical model P Markov on graph G ^ 9(p) 
is in the uniqueness regime if there exists a € (0, 1) such that for a.e. G and all I G N, 

(77) m8.^\\P[Xi-G]-P[Xi-Fi{i-G)]\\^ = d{a'). 

Comparing the above definition of the uniqueness regime and the conditional uniqueness regime 
in Definition 3, we note that the requirement for uniqueness regime is stronger. This is because for 
uniqueness regime, we require that the "faraway" nodes have a decaying effect on node marginal 
distributions, while for conditional uniqueness, we only require it upon conditioning on local sep- 
arators. Note that conditioning itself removes the effect of a subset of "faraway" nodes and thus, 
conditional uniqueness is a weaker requirement. The notion of uniqueness regime is well-studied 
(see [28, 43]) and has many implications. For instance, the mixing time of Gibbs sampling is poly- 
nomial (in the number of nodes) in the uniqueness regime. 

We now note sufficient condition for the uniqueness condition in (77) on lines of analysis in the 
previous section by requiring the maximum absolute edge potential of the Ising model to satisfy 

(78) Jmax < J*, 

where the threshold J* € M^ is the largest value which satisfies, for all / € N, 

(79) max\dBi{i;Tsi,„{i;G))\ = 0(tanh J*)'. 

The proof is on similar lines as that of Lemma 4 and is omitted. 

On lines of Corollary 1, we can obtain the threshold J* in explicit form for many graph fami- 
lies. Recall that SDeg(j'5 A) denotes any graph ensemble with maximum degree A and 9er{p,c/p) 
denotes the Erdos-Renyi ensemble, where an edge between any two nodes occurs with probability 
c/p. 

Corollary 3 (Threshold for Uniqueness). For a degree-bounded graph ensemble 9Dcg(P)A), 
(79) simplifies to 



(80) J^eg = atanh 



The above threshold can be improved for the Erdos-Renyi ensemble 9er{p,c/p) as 
(81) Jer = atanh f - j , c = O(polylogp). 



^^For uniqueness regime, we consider the notion of weak spatial mixing and limit to exponential decay of correla- 
tions. Refer to [28, 43] for other notions of correlation decay. 
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Remarks: 

Comparing the thresholds J* and J* for conditional uniqueness and uniqueness, we note that 
J* > J* . The difference between J* and J* is the largest upon exact separation. For instance, in 
a (A — l)-regular tree with degree A, the uniqueness threshold J* = 1/A, while the conditional 
uniqueness is J* = oo with t? = 1, since upon (exact) separation, there is no effect of faraway nodes. 
Thus, our criterion of conditional uniqueness is much weaker than the usual notion of uniqueness. 
This implies that we can guarantee efficient structure estimation in high dimensions for a wide 
range of models. 

B.2. Conditional Variation Distance Between Non-Neighbors. Recall that 

(82) iy,ij.,s-= min iy{P{X,\Xj = +,Xs = ^s),P{X^\Xj = -,Xs = ^s)), 

xse{-i,+i}isi 

where z^(-, •) denotes total variation distance. Using the notion of conditional uniqueness regime 
from Section B.l, we immediately obtain a bound for the conditional variation distance between 
non-neighbors of an Ising model, when the conditioning set is a /-local separator. 

Lemma 5 (Conditional Variation Distance Between Non-Neighbors). Given an Ising model 
satisfying conditional uniqueness regime according to Definition 3, for graphs satisfying (rj,^) -local 
separation property with rj = 0(1), we have 

(83) i^max(p;f/) := max min i^iu s = 0{a'^). 

{i,MG \S\<ri 

B.3. Conditional Variation Distance Between Neighbors. We now provide a lower 
bound on the conditional variation distance between neighbors. This implies that we can dis- 
tinguish edges and non-edges through conditional variation distance thresholding. We first provide 
explicit bounds for special cases such as attractive models. Using analytic theory, this implies that 
the bound also holds for generic values of edge potentials. 

B.3.1. Attractive Models. We first carry out the analysis for attractive models (Jj j > for all 

Proposition 2 (Variation Distance between Neighbors). For attractive Ising models Markov 
on graph G with maximum degree A having edge potentials Jmax ^ Ji,j ^ Jiam > and node 
potentials < hi < /imax? for any set S C V \ {i,j}, 

(84) min l/^i • 5 > - (tanh(Jmin + /I'max) + tanh(Jmm - /I'max)) > 

xse|Af||si 
where /i^ax ^■^ the modified node potential due to conditioning and marginalization. 
Proof: Using self-avoiding walk tree construction, we have, for any X5 € X' ', 

iy{P[X,\Xj = +,^s],P[Xi\Xj = -,X5]) 

= u{P[Xi\X^(^j) = +,X2^(5.),XA;rsaw(«;G')],P[Xj|Xi^(j-) = -,X^(5),XA;rsaw(«;G)]) 
(b) 

>iy{P[Xi\Xj^ = +,Xj^(5),XA;rsaw(i;G)],P[Xi|Xj2 = -,xw(5),XA;rsaw(i;G)]) 
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= - (tanh(Jjj + h'j) + tanh( Jjj - /I'J) , 

where equality (a) is from self-avoiding walk tree construction Tsaj^(i;G), inequality (b) is true for 
attractive models and ji refers to the copy of node j in Tsa.w{i',G) occurring as neighbor of i in 
Tsawii] G) and equality (c) is from the fact that the effect of terminal nodes A and conditioning set 
5 and marginalization over other nodes is to change the node potential oi i to h'-. D 

B.3.2. Generic Edge Potentials. When the Ising model is not necessarily attractive, it is harder 
to obtain lower bounds for conditional variation distance between neighbors, for any conditioning 
set. Note that the case where the neighbors are marginally independent belongs to the class of 
non-attractive models, and in this case, our method fails to recover the edge. We now show that 
such instances, where our method fails, form a set of Lebesgue measure zero, and that the bound 
established for attractive models also holds for general models under generic edge potentials. 

We first note the following result on analytic functions [30, Lemma 2]. 

Lemma 6 (Property of Analytic Functions). For an analytic function /(y) for y € D C M™, 
if f is non-trivial, i.e., there exists yo & D such that /(yo) 7^ 0, then the set where f vanishes has 
Lebesgue measure zero. 

Since the conditional variation distance is i^iu-s is an analytic function of the edge potentials 
J := [Jej, . . . , Je,„]) we have the following result. 

Proposition 3 (Variation Distance under Generic Potentials). For an Ising models Markov 
on graph G with edge potentials \Ji,j\ > Jmin, we have for any S C V \ {i,j}, 

(85) min i/ji,- 5 = ri(Jmin)- 

(iJ)6G 
xselA'lIsi 

Proof: We have that the function /(J) := I'lij-s ~ kTami^j{J), is an analytic function of the edge 
potentials J := [Je^, . . . , Je„], for a suitable constant k. Since /(J) > for an attractive model 
{Jij > 0), for a suitable constant k > 0, we have that the set of edge potentials J where /(•) 
vanishes is of measure zero. Thus, for generic edge potentials, i^i\j-s — ^{Jmm)- D 

B.3.3. Graphs with Local Paths. In the previous section, we established the bound for generic 
edge potentials. We now establish a stronger result that the bound holds for all edge potentials for 
a limited set of graphs: the class of graphs Slp(p;??>7) satisfying the (r/, 7)-local paths property. 
Recall that these graphs have at most ry paths of length less than 7. 

Lemma 7 (Variation Distance between Neighbors). Under assumptions (A2)-(A3) in Sec- 
tion 3.1, for an Ising model Markov on a graph G ~ 9(p;f?,7) satisfying the {r],j) local-paths 
property and the model is in the uniqueness regime according to (77), we have 

(86) iy,\^.s = ^{Jmin), y{i,j)£G,ScV\{i,j},\S\ = 0{l), 

where Jmin < \Ji,j\ ^ Jmax, for all {i,j) G G, and there exists a constant 5 > such that 
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Proof: Denote the subset of copies of any node j in the self-avoiding walk tree Tsaw(*; G) rooted 
at a node i with distance smaller than 7 as 



U^{j;Ts^^{r,G)) := {jk € Z^(j;Tsaw(i; G)) : dii,jk;Ts^^{r,G)) < 7}. 
We now have 

ly{P[X^\Xj = +,^s],P[Xi\Xj = -,X5])| 

= l/(P[Xi|X^(j) = +,X2^(s),XA;rsaw(«;G')],P[Xi|X^(j-) = -,X2^(5),X^;rsaw(i;G)]) 

(b) 

^=^i (tanh [I Ji,, + J(j.| + |/i'J] + tanh [\Jij + J^| - [/i^j]) - O(a^) 

(d)l _ 

> - (tanh [I J^in -iv- 1) JLxI + l^'^l] + tanh [| J^i, - (r? - 1) J^,J - \h[\]) - 0{a^) 

(e) 

= r2(tanh Jmin) 

where equality (a) is from the equivalence of conditional distributions on the self-avoiding walk 
tree (Theorem 6). For equality (b), recall that U{j;^) defined in (88), denotes the copies of node 
j in Tsa.viii]G), which are at distance smaller than 7 from root i. For equality (b), note that the 
uniqueness condition, according to (77), states that the effect of nodes beyond B^{i) decays as 
0{a''). Equality (c) arises from the self-avoiding walk tree configuration. The parameter /i^ is the 
modified node potential due to conditioning on nodes inZ^(S';7) and AriB^{i) and marginalization 
of the other nodes and is bounded since we condition on finite number of nodes. The parameter 
Jij is due to the contribution of the direct path (edge) from i to j while J'^ • is the contribution of 
all other paths from i to j of length less than 7. 

Inequality (d) arises from the (T/,7)-local paths property, which implies that there are at most r] 
copies of any node in Tsaw(^; G) within distance 7 from the root (Lemma 3). This implies that the 
worst-case configuration is when one path from i to a copy of j through the edge {i,j) having a 
minimum edge potential (i.e., Jij = Jmin and all the other paths to copies of j having the maximum 
potential but with the opposite sign, i.e., J^' • = — (ry — l)Jmax- This is because all the other paths 
are at least two hops away from i. Equality (e) arises when , _i'?'nj — is bounded away from one 
(and larger than one), and from assumption (A3), we have JmmO~'^ = 25(1). D 

APPENDIX C: SAMPLE-BASED ANALYSIS OF CVDT 

C.l. Concentration of Empirical Variation Distances. We have so far established bounds 
on conditional variation distance in graphs with local-separation property. We now provide concen- 
tration results for empirical variation distance estimated from samples. We use the following result 
on empirical distribution [59, Thm. 2.1]. 

Lemma 8 (Guarantees for General Empirical Distribution). The following is true for the em- 
pirical distribution P", obtained using n i.i.d. samples from a discrete distribution P: 

P[i/(P",P) > e] < 2l'^lexp[-2ne2]. 
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l'min(p) = fi(Jmiii 



Vrm.^{p\ri) = 0{a') 



Fig 4. Tfte threshold ^„,p in CVDT algorithm separates edges and non-edges with high probability, fmin and t'max ffl'"e 
defined in (95) and (97). In the above figure, it is assumed that fmin = Oil). 



Lemma 9 (Concentration Bounds). Given n i.i.d. samples from P, we have for all 5 > 0, 



(90) P 



i,3&V,\S\<rj " 
S(^V\{i,j} 



< 2''+3p''+2 



p ' exp 



nP2. a2 



2((5 + 2) 



Proof: From Lemma 8, 

\P\Xi,Xs,Xj) - P{Xi,Xs,Xj)\\, > Si] < 2''+2exp[-n5?/2], 
|P"(X5,X,) - P(X5,X,-)||i > 62] < 2"+! exp[-n52V2]. 
Under the event, that \\P'^{Xi,Xs,Xj) - P{Xi,Xs,Xj)\\^ < 61 and \\P''(Xs,Xj) - P{Xs,Xj)\\^ < 

Si + 52 



|P"(Xi|X5 = ^s,Xj = xj) - PiXi, 1X5 = X5,X, = x,)||i < 



P 



min O2 



If we require a bound of 5 for ||P"(Xj|X5 = xs,Xj = Xj) — P{Xi\X.s = xs,Xj = Xj)\\-^^, we can 
choose 82 = k6Pmm and 5i = Pmin'5(l — k — k5). Setting k = 1/(5 + 2) gives the optimal exponent. 

D 

C.2. Asymptotic Guarantees for CVDT. We first provide rough asymptotic arguments for 
recovery under CVDT. We sharpen them to finite sample complexity results in Section 3.2.2. For 
any {i,j) ^ Gp, define the event 

(91) J-i(i,i;{x"},Gp) := {%.5 > C„,p} , 

where S,n,p is the threshold in (20). Similarly for any edge {i,j) G Gp, define the event that 

(92) T2{i,j; {x"}, Gp) := {D^y.^s < Up} ■ 

The probability of error resulting from CVDT can thus be bounded by the two types of errors, 



P[CVDT({x"};en,p;r?)7^Gp]<P 



U J-2(i,j;{x"},Gp) 

(ij)6Gp 
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(93) 



+ P 



U J-i(i,i;{x"},G, 



For the first term, applying the concentration result in (90) of Lemma 9, 



(94) P 



where 



U J-2(i,j;{x"},Gp) 

.{j,i)6Gp 



0(p'^+2exp[-nO(z.^in-e„,p)']) 



(95) z^min := mm mm fju-.g = il(Jmm), 
(«,i)6Gp |S|<?7 

scv\{i,j} 

from Lemma 7. Since S,n,p = 0(>/mm)! (94) is o(l) when n = r2(Jj^;jjlogp). For the second term in 
(93), 



(96) P 



where 



U ^i(i,j;{x"},Gp) 

ihMGp 



Oip'^+' exp[-nO{U,p - l^max)']), 



(97) t'inax(p;f?) := ^max min 1/^, • 5 = 0(a'^), 

ScV\{i,j} 

from (83). For the choice of ^n,p in (20), (96) is o(l) 



D 



C.3. PAC Guarantees for CVDT. We now sharpen the results of the previous section to 
provide finite sample complexity bounds. Recall that 

t'max(p;f?) := ^max min i^iy-s- 
ihMGp \S\<ri 

ScV\{i,j} 

Given a fixed 6 > 0, recall that we choose threshold S,n,p as 

(98) ^n,pi^) = ^'max(p; V) + S- 

On lines of the error events (91) and (92) defined in the previous section and using the concentration 
bounds in Lemma 9, we have that 



P[CVDT({x"};en,p(5);r/) / G'^.^s] < 2^^+ V+' exp 
The results of Lemma 1 follow from Corollaries 1 and 2 






(6 + 2) 



D 
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APPENDIX D: NECESSARY CONDITIONS FOR STRUCTURE ESTIMATION 

D.l. Erdos-Renyi Random Graphs. This proof is inspired by [11, Thm. 1]. Fix any deter- 
ministic estimator Gp. Denote TZ := Gp((A'P)") as the range of the estimator Gp. This is the set of 
all graphs that can be output by the estimator Gp. Then we have the sequence of lower bounds: 

Px",G,(Gp y^ Gp) ^^ Yl JPx|G,=9(Gp y^ Gp\Gp = g)FG,iGp = g) 

+ E ^x|G,(Gp ^ Gp\Gp = g)FG,{Gp = g) 
gen 

> E ^^\G,iGp + Gp\Gp = g)FG,{Gp = g) 

^^ Y,FG,{Gp = g) 

(99) ^^\-Y^FG,{Gp = g\ 

gen 

where equality (a) comes from the fact that Qp = TZUTZ'^, inequality (b) lower bounds the sum by 
the term involving TZ^, inequality (c) is due to the fact that Px|g {Gp 7^ Gp\Gp = g) = 1 for all 
g eW and finally inequality (d) is because Yjgen^Gp{Gp = g) + Y^gan^ ^GplGp = g) = l. 
Now we provide an asymptotic upper bound for the term 

T:=Y,^G,{Gp = g). 
g&n 

To do so, first note that |7?.| < \XP\^ = 2"''". Furthermore, let kg G {1, • • • i (2)} denote the number 
of edges in the graph g G Qp. Then, 



(100)Fg,(G, = ,)=(^)'^(i-^) 



©-fc« 



Eqn. (100) says that if the probability of edge appearance c/p < 1/2 (which is the case of interest) 
then P(Gp = g) is maximized at kg = 0. In fact, we have the general result that for graphs gi,g2 G Gp 

(101) kg, < kg, => Fg,{Gp = gi) > Fg,{Gp = 52). 

It is then straightforward to show that the natural number 



(102) z := min J ? G N : E ( y ) 



> 1' 



is of the order nm/logp (by solving for / in (102)). The quantity z defined in (102) is to be 
interpreted as the number of edges such that the sum of the number of graphs with no greater than 
z edges is at least 2"'". Thus, 

r\''2j2FG,{Gp = g) 
g&n 
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k=0 


V k J \pj V pJ 


-k 


t, UJw V p) 


(d) 

< exp 


[nc-0{- ) 

nc\ MogpV 





where (a) follows from the definition of T, (b) follows from rewriting T in terms of z, the number of 
edges and by using (100), (c) follows from (102), and (d) follows from the fact that Pr(Bin(A^, q) < 
k) < exjp{—jj^{Nq — k)'^) for k < Nq with the identifications N = (g) and q = c/p. Finally, we 
observe from (d) that if n = ac log p for some a > 0, then the term T — )• as p ^ oo. Thus, referring 
back to (99) and noting the arbitrariness of Gp, we conclude that if n < eclogp for sufficiently small 
e > 0, then Px",Gp(Gp / Gp) ^ 1. D 

D.2. Other Graph Families. Proof of Lemma 2: The proof is by counting arguments. For 
girth-bounded graphs, we prove by recursively adding edges. At each stage, one endpoint of the edge 
can be picked out of p nodes while the other end point cannot be a node within (7- hop neighborhood 
of the first end point. The number of such nodes is at least A^^^^ and at most X]f=i Amax < ^Amax- 
By recursively adding edges we have the result. 

We now consider local-paths graphs. Given a graph G, form a partition of nodes such that 
nodes in the same partition have graph distance at most 7. The number of partitions is at least 
m,\ := p/^A]aa.x and at most m2 := p/Aj^j^^. In each partition, the tree excess (additional edges 
compared to a tree) is ry — 1 from local paths property. Thus, if these edges are removed from all 
partitions, we obtain a graph with girth 7 with number of edges in [/ci,A:2]) and use the bound 
previously derived. We finally note that in each partition, the rj — 1 edges can be chosen arbitrarily 
given the graph of girth 7. 

For augmented graphs, the result is straightforward by noting that there p{^~^ ) regular graphs 
of degree d. D 

APPENDIX E: PROPERTIES OF POWER-LAW GRAPHS 

We briefly note the local-paths property of power-law random graphs. Recall that the ensemble 
Slp(p;^)7) has at most rj paths of length at most 7 in G between any two nodes or equivalently, 
there are at most r] — 1 number of overlapping cycles of length smaller than 27. We now describe 
the power-law random graph model. For details, refer to [20, Ch. 5]. 

For a given sequence w = {wi,W2, ■ ■ ■ , Wp), the random-graph G = {V, E) with V = {1, . . . ,p} 
is generated as follows: for any two nodes i,j € V, the probability of edge {i,j) occurs with 
probability WiWjp, independent of other edges, where p := C^jWj)'^ is the normalization factor. 
The sequence w^ is the sequence of expected degrees in the random-graph model. A power-law 
random graph ensemble SpLip,w, (3, A) has an expected degree sequence given by 



1 
Wi = ai "-1 , Vi > io, 



(/3-2)_ ^ . fw{^-2) 



13-1 
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where w is the average degree, A is the maximum degree and /3 > is the exponent of the power 
law. We immediately see that a special case of the above parameterization is the Erdos-Renyi 
ensemble G ~ Ser(P) c/p) where Wi = c for all i € V, implying that w = c and /3 = oo. 

Proposition 4 (Local-Paths Property of Power-Law Graphs). The power-law random graph 
ensemble SpLip,uJ, (3, A) satisfies the [r] , 'y) -local paths property a.a.s. when 

17-1 2 

(103) w = o{p-2vi /9-l), 

Proof: Let F = {Vp, Ep) be a graph which is the union of at least ry cycles of length less than 27. 
We see that \Ef\ = |Vi?| + r? — 1 and \Ef\ < 2jr]. By a counting argument, the expected number of 
subgraphs F in G ~ 9pi^{p,w,(3,A) is bounded by 



by substituting for a and p and using the fact that \Ep\ = \Vp\ -|-r? — 1. Thus, the expected number 
of subgraphs F in G ~ Spl{p,w, (3, A) is o(l) when (103) holds by noting that \Ep\ < Ijrj. By 
Markov's inequality, the subgraph F does not occur in G a.a.s. D 

Thus, we have a relationship between the average degree W, the power-law exponent /3, the 
number of local paths t] and the threshold 7 on the length of the paths. We note that in the special 
case of Erdos-Renyi ensemble 9er{p,c/p), the (??,7)-local path property is satisfied when 

log p 
(104) r? = 2, 7 < 



4 log c ' 
by substituting w = c and /3 = 00. 
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